add ROCm configs #37

dtrifiro · 2025-05-09T12:40:50Z

This depends on https://github.com/neuralmagic/nm-cicd/pull/103 to allow for the accuracy/guidellm workflows to actually use the accelerator-specific overrides

mistralai/Mixtral-8x7B-Instruct-v0.1: add rocm accuracy server override (avoids using tensor-parallel=8)
Llama-3.1-8B-Instruct add accuracy/server-rocm (set gpu_memory_utilization to avoid OOM errors)

dhuangnm · 2025-05-09T19:37:58Z

mistralai/Mixtral-8x7B-Instruct-v0.1/accuracy/server-rocm.yml

@@ -0,0 +1,4 @@
+# https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
+model: 'mistralai/Mixtral-8x7B-Instruct-v0.1'


The model config here can be removed.

…prevent OOM (ROCm)

dtrifiro force-pushed the rocm-configs branch from 790817c to 049aaf1 Compare May 9, 2025 15:01

dhuangnm reviewed May 9, 2025

View reviewed changes

dtrifiro force-pushed the rocm-configs branch 3 times, most recently from cec96c2 to bae1481 Compare May 12, 2025 16:21

dtrifiro added 9 commits May 13, 2025 11:27

mistralai/Mixtral-8x7B-Instruct-v0.1: add rocm accuracy server override

a91f4c4

Llama-3.1-8B-Instruct add accuracy/server-rocm.

0640959

meta-llama/Llama-3.1-8B-Instruct: decrease gpu_memory_utilization to …

049f3b3

…prevent OOM (ROCm)

mistralai/Mistral-Small-3.1-24B-Instruct-2503: add rocm server override

ba9cac2

add rocm accuracy overrides for Mistral 24b and Ph

08894ea

reduce ROCm gpu memory gpu_utilization

736fd47

mistralai/Mixtral-8x7B-Instruct-v0.1: set tensor-parallel-size=2

c032e04

cleanup configs

51867fe

rocm: use enforce-eager to avoid OOM errors

cfa2b6f

dtrifiro force-pushed the rocm-configs branch from bae1481 to cfa2b6f Compare May 13, 2025 13:40

Qwen: override gpu-memory-utilization=0.6

483b7d2

Provide feedback

		@@ -0,0 +1,4 @@
		# https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
		model: 'mistralai/Mixtral-8x7B-Instruct-v0.1'